A brief analysis of QQ message

There was a message collector-bot developed around Oct 2017, and it started collect from 29/10/2017.

几点结论:

  • 每天11点和下午3天最活跃
  • 每周三最活跃
  • 林芝是一号活跃人物
  • 复读机是灵芝好基友,而且压倒性优势

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline
chat_file = 'data/chat_hist_20180101.csv'

In [ ]:
# read all data ignore bad lines
df = pd.read_csv(chat_file, error_bad_lines=False)

Pre processing


In [32]:
# drop all null values
non_na_df = df.dropna()

In [34]:
# create timestamp index
index = pd.to_datetime(non_na_df['created_at'])
non_na_df.index = index

In [78]:
# overall info
print('Total records {}'.format(len(non_na_df)))
print('Start / End : {}, {}'.format(non_na_df.index.min(), non_na_df.index.max()))


Total records 65184
Start / End : 2017-10-29 06:24:45, 2017-12-31 16:51:29

In [ ]:
non_na_df['hour'] = (non_na_df.index.hour + 10) % 24
non_na_df['dayofweek'] = non_na_df.index.dayofweek

Chat activities by hours


In [72]:
non_na_df.groupby('hour').count()['sender_qq'].plot(kind='bar', figsize=(18,6))
plt.tight_layout()
plt.title('Activity by aHour')
plt.ylabel('Total Chat Count')
plt.xlabel('Hour (24h)')


Out[72]:
<matplotlib.text.Text at 0x1221b2ac8>

Chat activities by day of week


In [74]:
non_na_df.groupby('dayofweek').count()['sender_qq'].plot(kind='bar')
plt.tight_layout()
plt.title('Activity by Day of Week')
plt.ylabel('Total Chat Count')
plt.xlabel('Day of Week')


Out[74]:
<matplotlib.text.Text at 0x12215fe48>

Top chatter over time


In [82]:
# top chatter over time
top_chater = non_na_df['sender_card'].value_counts()
top_chater.nlargest(20).plot(kind='bar', figsize=(14,6))


Out[82]:
<matplotlib.axes._subplots.AxesSubplot at 0x122b5aef0>

找到好基友...


In [83]:
# find all message with @someone
at_messages = non_na_df[non_na_df.message_text.str.startswith('@')]

In [87]:
# extract the name card from who has been mentioned
def get_atee(m):
    if m.startswith('@'):
        try:
            return m.split()[0]
        except:
            return 'N/A'
    else:
        return 'N/A'

In [ ]:
at_messages['who'] = at_messages.message_text.apply(get_atee)

In [94]:
at_messages.who.value_counts().nlargest(20).plot(kind='bar', figsize=(18,6))


Out[94]:
<matplotlib.axes._subplots.AxesSubplot at 0x12396edd8>

In [105]:
who_at_lynch = at_messages[at_messages['who'].str.startswith('@Ade-Lynch')]

In [108]:
who_at_lynch['sender_card'].value_counts().nlargest(5).plot(kind='bar', figsize=(10,10))


Out[108]:
<matplotlib.axes._subplots.AxesSubplot at 0x123aee898>